Constructing Large Proposition Databases
نویسندگان
چکیده
Using semantic parsing or related techniques, it is possible to extract knowledge from text in the form of predicate–argument structures. Such structures are often called propositions. With the advent of massive corpora such as Wikipedia, it has become possible to apply a systematic analysis of a wide range of documents covering a significant part of human knowledge and build large proposition databases from them. While most approaches focus on shallow syntactic analysis and do not capture the full meaning of a sentence, semantic parsing goes deeper and discovers more information from text with a higher accuracy. This deeper analysis can be applied to discover temporal and location-based propositions from documents. Medical researchers could, for instance, discover articles regarding the interaction of bacteria in a specific body part. Christensen et al. (2010) showed that using a semantic parser in information extraction can yield extractions with higher precision and recall in areas where shallow syntactic approaches have failed. This accuracy comes at a cost of parsing time. However, in the recent years, statistical parsing and especially semantic parsing have become increasingly accurate and efficient in analyzing text. This Master’s thesis describes the creation of multilingual proposition databases using generic semantic dependency parsing. Using a broad domain corpus, Wikipedia, we extracted, processed, clustered, and evaluated a large number of propositions. We built an architecture to provide a complete pipeline dealing with the input of text, extraction of knowledge, storage, and presentation of the resulting propositions. Furthermore, our system is able to handle large-scale extractions, wide domains, and multiple input languages. Wherever possible, the handling of information is automated such that manual labor is kept to a minimum. Proposition databases like the one we constructed, combined with other lexical databases, are expected to be key components in semantic search technology, machine translation, and question and answer (Q&A) systems.
منابع مشابه
Constructing a Legal Database on Quixote
Legal reasoning is one application of large-scale knowledge information processing, where arti cial intelligence, natural language processing, databases and other technologies are integrated. It is the target for the next-generation of databases. In order to investigate whether or not the deductive object oriented database (DOOD) language/systemQUIXOT E is e ective in legal reasoing, we are bot...
متن کاملSux Array 9=@.%"%k%4%j%:%'$nhf3s Sux Array $,$"$k!#$3$l$oj8;zns$na4$f$n@\hx<-$n%]%$%s%?$r<-=q=g$k3jg<$7$?g[ns$g!" Comparison among Sux Array Construction Algorithms
Sux array is a compact data structure for searching matched strings from text databases. It is an array of pointers and stores all suxes of a text in lexicographic order. Because its memory requirement is less than tree structures, it is eective for large databases. Moreover, constructing the sux array is used in the Block Sorting compression scheme. We compare algorithms for constructing sux a...
متن کاملFocused Entailment Graphs for Open IE Propositions
Open IE methods extract structured propositions from text. However, these propositions are neither consolidated nor generalized, and querying them may lead to insufficient or redundant information. This work suggests an approach to organize open IE propositions using entailment graphs. The entailment relation unifies equivalent propositions and induces a specific-to-general structure. We create...
متن کاملTailoring Pattern Databases for Unsolvable Planning Instances
There has been an astounding improvement in domainindependent planning for solvable instances over the last decades and planners have become increasingly efficient at constructing plans. However, this advancement has not been matched by a similar improvement for identifying unsolvable instances. In this paper, we specialise pattern databases for dead-end detection and, thus, for detecting unsol...
متن کاملThe Foundations: Logic and Proof, Sets, and Functions
Learning to construct good mathematical proofs takes years. There is no algorithm for constructing the proof of a true proposition (there is actually a deep theorem in mathematical logic that says this). Instead, the construction of a valid proof is an art, honed after much practice. There are two problems for the beginning student—figuring out the key ideas in a problem (what is it that really...
متن کامل